Preparing sampling

At this point, let's just take random samples, in the future I can change this so I can decide what proportion of each label I want to pick; that is, setting a weight for each label.

It is important to note that in order to make random kernel truly random, we need to explicitly pass the random seed to them. Otherwise the random states for all kernels (i.e. get_sample below) will have the same random states, leading to the same results on different runs.

Next Step

XY is saved in ./cleaned_data.pkl.

Model v02

Model v03